A Comprehensive Isolated Farsi/Arabic Character Database for Handwritten OCR Research
نویسندگان
چکیده
This paper presents a new comprehensive database for isolated offline handwritten Farsi/Arabic numbers and characters for use in optical character recognition research. The database is freely available for academic use. So far no such a freely database in Farsi language is available. Grayscale images of 52,380 characters and 17,740 numerals are included. Each image was scanned from Iranian school entrance exam forms during the years 2004-2006 at 300 dpi. The only restriction imposed on the writers is to write each character within a rectangular box. The number of samples in each class of the database is non-uniform corresponding to their real life distributions. Also, for comparison purposes, each dataset has been properly divided into respective training and test sets. To validate the effectiveness of a proposed system for Farsi (Arabic) OCR research, it is necessary to compare it with other approaches. Now, such comparison is possible by implementing the concurrent approaches concurrently and then applying them with the proposed method on the same database. Therefore, in the filed of Farsi (Arabic) OCR, a standard database is needed to facilitate researches.
منابع مشابه
Isolated Persian/Arabic handwriting characters: Derivative projection profile features, implemented on GPUs
For many years, researchers have studied high accuracy methods for recognizing the handwriting and achieved many significant improvements. However, an issue that has rarely been studied is the speed of these methods. Considering the computer hardware limitations, it is necessary for these methods to run in high speed. One of the methods to increase the processing speed is to use the computer pa...
متن کاملMulti-Font Farsi/Arabic Isolated Character Recognition Using Chain Codes
Nowadays, OCR systems have got several applications and are increasingly employed in daily life. Much research has been done regarding the identification of Latin, Japanese, and Chinese characters. However, very little investigation has been performed regarding Farsi/Arabic characters recognition. Probably the reason is difficulty and complexity of those characters identification compared to th...
متن کاملAn HMM-based Farsi OCR
OCR (Optical Character Recognition) is the digital encoding of printed and handwritten characters from an image file created through a scanner or other optical imaging devices. In other words, OCR is a software program that converts image-texts into computerized or digital text (figure 1) . While OCR has been extensively used as the basic application of different learning methods in machine lea...
متن کاملHybrid of Rough Neural Networks for Arabic/Farsi Handwriting Recognition
Handwritten character recognition is one of the focused areas of research in the field of Pattern Recognition. In this paper, a hybrid model of rough neural network has been developed for recognizing isolated Arabic/Farsi digital characters. It solves the neural network problems; proneness to overfitting, and the empirical nature of model development using rough sets and the dissimilarity analy...
متن کاملHandwritten Arabic Numeral Recognition using a Multi Layer Perceptron
Handwritten numeral recognition is in general a benchmark problem of Pattern Recognition and Artificial Intelligence. Compared to the problem of printed numeral recognition, the problem of handwritten numeral recognition is compounded due to variations in shapes and sizes of handwritten characters. Considering all these, the problem of handwritten numeral recognition is addressed under the pres...
متن کامل